Corpus-Based Statistics-Oriented (CBSO) Machine Translation Researches in Taiwan
نویسنده
چکیده
A brief introduction to the MT research projects in Taiwan is given in this paper. Special attention is given to the more and more popular corpus-based statistics-oriented (CBSO) approaches in MT researches. In particular, the parameterized two-way training philosophy in designing the second generation BehaviorTran, which is the first and the largest operational system in this area, is introduced in this paper. 1. Overview of the Machine Translation Projects in Taiwan The first MT research project (the ArchTran MT system, aka the BehaviorTran system) in Taiwan was conducted in 1985 in National Tsing-Hua University [Hsu 86]. Later, MT related projects were subsequently started in many other academic institutions and private sectors, including the Institute of Computer Science and Information Engineering of National Taiwan University, Industry Technology Research Institute (ITRI), Matsushita Electric Institute of Technology (Taiwan), Wang Corporation (Taiwan), Institute of Information Engineering of National Chiao-Tung University, Institute of Information Science, National Tsing-Hua University, and so on. Related NLP researches are also conducted in other Institutions, such as the Institute of Information Science, Academia Sinica and the Telecommunication Technology Laboratory. Because of the blooming growth in the related researches, an annual conference, the ROCLING conference, was initiated in 1988; the ROCLING Society (ROC Computational Linguistics Society) was also founded as a regular organization for promoting various NLP researches, including machine translation. Many workshops were called in a non-regular basis since the foundation of the ROCLING Society. SIGMT for machine translation, and special MT workshops, in particular, were also held for people who are interested in this technical field of research. Since the BehaviorTran MT system is the first MT project in this area, and it is also the largest operational system in this area, its working experiences have significant influence on the other related researches. In fact, the adoption of corpus-based statistics-oriented approaches
منابع مشابه
An Overview of Corpus-Based Statistics-Oriented(CBSO) Techniques for Natural Language Processing
A Corpus-Based Statistics-Oriented (CBSO) methodology, which is an attempt to avoid the drawbacks of traditional rule-based approaches and purely statistical approaches, is introduced in this paper. Rule-based approaches, with rules induced by human experts, had been the dominant paradigm in the natural language processing community. Such approaches, however, suffer from serious difficulties in...
متن کاملIntroduction to Corpus - Based Statistics - Oriented ( Cbso ) Techniques ( Part Ii : Basic Concepts )
متن کامل
Why Corpus-Based Statistics-Oriented Machine Translation
Rule-based approaches have been the dominant paradigm in developing MT systems. Such approaches, however, suffer from difficulties in knowledge acquisition to meet the wide variety and time-changing characteristics of the real text. To attack this problem, some statistical translation models and supporting tools had been developed in the last few years. However, a simple statistical model often...
متن کاملA Rule-Based and MT-Oriented Approach to Prepositional Phrase Attachment
Prepositional Phrase is the key issue in structural ambiguity. Recently, researches in corpora provide the lexical cue of prepositions with other words and the information could be used to partly resolve ambiguity resulted from prepositional phrases. Two possible attachments are considered in the literature: either noun attachment or verb attachment. In this paper, we consider the problem from ...
متن کاملImproving the precision of automatically constructed human-oriented translation dictionaries
In this paper we address the problem of automatic acquisition of a human-oriented translation dictionary from a large-scale parallel corpus. The initial translation equivalents can be extracted with the help of the techniques and tools developed for the phrase-table construction in statistical machine translation. The acquired translation equivalents usually provide good lexicon coverage, but t...
متن کامل